R Data Science: Spatial Data Science 2

steppe

2/14/2022

Assigned Reading: Chapter 3 of Spatial Data Science

Pioneer Mountains, Idaho

Pioneer Mountains, Idaho

Topology

Theoretical Relations

DE-9IM, By Krauss

DE-9IM, By Krauss

In the example in the Upper Middle panel

The dimensions which contain the intersection of the ‘Interior’ of ‘A’ to the ‘Boundary’ of ‘B’ is equal to a line.

\[ dim[I(a)∩B(b)] = 1 \]


\[ \text{∩ is the 'Intersection'} \]


In the example in the Upper Middle panel

The dimensions which contain the intersection of the ‘Interior’ of ‘A’ to the ‘Boundary’ of ‘B’ is equal to a line.

Applications

When identifying solutions to a problem we are generally interested in how features affect each other. This schema provides us a framework for organizing our area of analyses, and subsetting the appropriate data.

We are generally interested in :





Geometric Operations on Spatial Predicates

Theoretical Properties

Spatial Predicates after Egenhofer and Herring (1990)

Spatial Predicates after Egenhofer and Herring (1990)

Applications

st_disjoint(A, B_disjoint, sparse = F): TRUE
st_touches(A, B_touches, sparse = F): TRUE
st_equals(A, B_overlap, sparse = F): TRUE
st_covers(A, B_covers, sparse = F): TRUE
st_contains(A, B_contains, sparse = F): TRUE
st_equals(A, B_equals, sparse = F): TRUE
st_equals(A, B_disjoint, sparse = F): FALSE
st_intersects(A, B_touches, sparse = F): TRUE
st_intersects(A, B_overlap, sparse = F): TRUE
st_intersects(A, B_covers, sparse = F): TRUE
st_intersects(A, B_contains, sparse = F): TRUE
st_intersects(A, B_equals, sparse = F): TRUE
st_within(A, B_contains, sparse = F): FALSE
st_within(B_contains, A, sparse = F): TRUE
st_coveredby(A, B_equals, sparse = F): TRUE

Spatial Join

nghbrhd_parks <- st_join(chi_neighb, chi_parks, 
               join = st_intersects, 
               left = TRUE 
               ) %>% 
  count(pri_neigh) 

Assorted Spatial Vector Operations

This is really a grab bag of what I and others seem to use most the often. We are going to focus on vector data.

Import a Vector file

files <- paste0('spatial_lecture_data/Chicago_Neighborhoods/' ,
                list.files('spatial_lecture_data/Chicago_Neighborhoods', ".shp"))
chi_neighb <- read_sf(files)
rm(files)

Convert to and from sp objects

us_states.sf <- st_as_sf(us_states.sp)
class(us_states.sf)
[1] "sf"         "data.frame"
us_states.sp <- as(us_states.sf, 'Spatial')
class(us_states.sp)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

Make tabular point data spatial

museums.sf <- st_as_sf(museums, 
                       coords = c(x = 'longitude', y = 'latitude'), 
                       crs = 4326, 
                       remove = F
                       )
attribute latitude longitude geometry
Field Museum 41.86666 -87.61643 POINT (-87.61643 41.86666)
Science and Industry 41.79006 -87.58227 POINT (-87.58227 41.79006)

st transform

st_crs(museums.sf)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]
museums.conus_albers <- st_transform(museums.sf, crs = 5070)
[1] "EPSG:5070"

Remove simple feature list column

museums <- st_drop_geometry(museums.sf)
st_geometry(museums.sf) <- NULL

Make a geometry valid as a Simple Feature

p1 = st_as_sfc("POLYGON((0 0, 0 10, 10 0, 10 10, 0 0))")
# create a geometry which will not be valid
st_is_valid(p1)
st_is_valid(p1, reason = TRUE)
st_is_valid(p1): FALSE
st_is_valid(p1, reason = TRUE): Self-intersection[5 5]
p2 <- st_make_valid(p1)
If we call class(p1), we can see that our original simple feature collection (sfc) contained: sfc_POLYGON
If we call class(p2), we can see that our valid simple feature collection (sfc) contains: sfc_MULTIPOLYGON

Aggregate Simple Features via ‘group_by’ & ‘summarize’

NAME REGION geometry
Alabama South MULTIPOLYGON (((-88.20006 3…
Arizona West MULTIPOLYGON (((-114.7196 3…
Colorado West MULTIPOLYGON (((-109.0501 4…
Connecticut Norteast MULTIPOLYGON (((-73.48731 4…
Florida South MULTIPOLYGON (((-81.81169 2…
Georgia South MULTIPOLYGON (((-85.60516 3…
regions <- us_states %>%
  group_by(REGION) %>% 
  summarize(geometry = st_union(geometry))

REGION geometry
Norteast MULTIPOLYGON (((-70.11867 4…
Midwest MULTIPOLYGON (((-85.48703 4…
South MULTIPOLYGON (((-81.68524 2…
West MULTIPOLYGON (((-118.4887 3…

Area Calculations

st_area(chi_neighb)
Units: [m^2]
[1]  4498293.6   200563.5  3016684.2   972375.8 11596322.7

These data are always returned in units of meters, regardless of the coordinate system specifications. However, different Coordinate Systems will give different results

[1] 2
integer(0)
81852.82 [m^2]
42261063 [m^2]

integer(0)

When we use the ‘which’ logical test, we see that all areas in the results from both sets of area calculations are different. When we plot the values we see that both sets have nearly perfect correlation.

But neither of these values are as accurate as they could be. We want to use an Equal Area projected coordinate system instead.

Here we use the NAD83 Conus Albers, a favorite of the United States Geological Survey (USGS).

equal_areas_proj <- chi_neighb%>% 
  st_transform(5070) %>% 
# We want to use an equal area projection! 
  st_area()

equal_areas_proj <- sort(equal_areas_proj)

While metric values are easy to convert by ‘hand’, we can also convert them using the ‘units’ package.

units(equal_areas_proj) <- units::make_units(km^2)
head(equal_areas_proj)
Units: [km^2]
[1] 0.08197556 0.15334606 0.20086315 0.25781886 0.31268310 0.32418247
Size of the Largest and Smallest Neighborhoods in Chicago
Smallest
Largest
Neighborhood Area (m^2) Neighborhood Area (m^2)
Magnificent Mile 81975.56 O’Hare 34545390
Greektown 153346.06 South Deering 28222390
Printers Row 200863.15 Englewood 16127577
Millenium Park 257818.86 Austin 15796991
Boystown 312683.10 Hegewisch 13559961

Length calculations

st_length(NU_campuses)
18578.52 [m]

Boundary calculations

rogers_park <- chi_neighb %>% 
  filter(pri_neigh == 'Rogers Park')

rogers_park <- st_cast(rogers_park, to = 'MULTILINESTRING')
rogers_park <- st_cast(rogers_park, to = 'LINESTRING')
st_length(rogers_park)
10370.89 [m]

Centroids & Point on Surface

chi_cent <- st_centroid(chi_neighb)
chi_pos <- st_point_on_surface(chi_neighb)

Combine & Union Features

chi_combine <- st_combine(chi_neighb)
chi_union <- st_union(chi_neighb)

Cast

chi_sub <- chi_neighb[sample(size = 30, 1:nrow(chi_neighb)),]
chi_union <- st_union(chi_sub)
chi_cast <- st_cast(chi_union, to = "POLYGON")

Original data
Neighborhood geometry
Hegewisch MULTIPOLYGON (((-87.52462 4…
Chicago Lawn MULTIPOLYGON (((-87.67815 4…
South Shore MULTIPOLYGON (((-87.5921 41…
East Village MULTIPOLYGON (((-87.67705 4…
Ukrainian Village MULTIPOLYGON (((-87.67705 4…
Lower West Side MULTIPOLYGON (((-87.63516 4…
Unioned Data
geometry
MULTIPOLYGON (((-87.52501 4…
Data Cast to Polygons
geometry
POLYGON ((-87.52501 41.6918…
POLYGON ((-87.67705 41.8959…
POLYGON ((-87.54612 41.7231…
POLYGON ((-87.62761 41.8745…
POLYGON ((-87.83658 41.9864…
POLYGON ((-87.94001 41.9981…

Buffers & Convex Hulls

chi_buffer_5k <- st_buffer(chi_union, dist = 3000)

chi_ch_by_neigh <- st_convex_hull(chi_neighb)
chi_ch_cit <- st_convex_hull(chi_union)
chi_ch_buf <- st_convex_hull(st_buffer(chi_union, dist = 3000))
Buffer

Buffer

Convex Hull of Chicago Neighborhoods, The City, and a buffer of 3km

Convex Hull of Chicago Neighborhoods, The City, and a buffer of 3km

Measuring Distances

Theory: Remember the earth is a ellipsoid, distances must be calculated along the earths curved surface

Great Circle Distance, by: ChecCheDaWaff

Great Circle Distance, by: ChecCheDaWaff

AM_FM <- st_distance(AM, FM)
AM_SM <- st_distance(AM, SM)
FM_SM <- st_distance(FM, SM)
Distance (Km) Journey
1144.3 American to Field
334.7 American to Smithsonian
955.4 Field to Smithonian

We can visualize this with some help from Charlie Joey Hadley whom some sf compliant great circle distance code

gcr_distance <- st_length(great_circle_routes)
neighborhood_centroid_dist <- st_distance(st_centroid(chi_neighb))
The five most Central Chicago Neighborhoods
Neighborhood Distance
Little Italy, UIC 10291.97
Lower West Side 10350.90
West Loop 10426.27
Greektown 10455.72
United Center 10498.78

An introduction to some types of Spatial Analyses

Voronoi Polygons

Identifying Distance Based Neighbors

Neighbour list object:
Number of regions: 81 
Number of nonzero links: 1780 
Percentage nonzero weights: 27.13001 
Average number of links: 21.97531 
2 regions with no links:
13 43
Link number distribution:

 0  4  5  6  7  8  9 15 17 18 20 21 22 23 24 25 26 27 28 29 30 32 33 34 
 2  1  1  1  4  3  2  2  2  1  5  3  1  2  9 10  8  9  2  3  3  1  4  2 
1 least connected region:
81 with 4 links
2 most connected regions:
26 27 with 34 links

Kernel Density Estimation of a Point Pattern

Cluster a Point Pattern

Wrapping up the Spatial Data Science Module:

Future Bonus SDS Office Hours Wednesday Night at 5:00 - 6:00 in F-413.

Notes on this Lab My lecture notes are in the R script I used to generate all of the novel figures for this presentation. This script is much longer and more complex than Lecture 1’s script. Likewise this presentation is an .HTML file and can be launched from your computer (it was rendered directly from R using the script).

Works Cited

https://cran.r-project.org/web/packages/gstat/vignettes/gstat.pdf Accessed 1.21.2021

Bivand, R. https://cran.r-project.org/web/packages/spdep/vignettes/nb_sf.html Accessed 1.17.2021

Clementini, Eliseo; Di Felice, Paolino; van Oosterom, Peter (1993). “A small set of formal topological relationships suitable for end-user interaction”. In Abel, David; Ooi, Beng Chin (eds.). Advances in Spatial Databases: Third International Symposium, SSD ’93 Singapore, June 23–25, 1993 Proceedings. Lecture Notes in Computer Science. 692/1993. Springer. pp. 277–295. doi:10.1007/3-540-56869-7_16. ISBN 978-3-540-56869-8. Accessed 1.16.2021

Egenhofer, M.J.; Herring, J.R. (1990). “A Mathematical Framework for the Definition of Topological Relationships”

Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. (eds.). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220. ISBN 1-57735-004-9.

Hadley, C.J. https://www.findingyourway.io/blog/2018/02/28/2018-02-28_great-circles-with-sf-and-leaflet/ Accessed 1.16.2021

Gimond, M. https://mgimond.github.io/Spatial/point-pattern-analysis-in-r.html Accessed 1.22.2022

Moreno, M., Basille, M. https://r-spatial.org/r/2018/10/25/ggplot2-sf-2.html Accessed 1.22.2022

Dennett, A. https://rstudio-pubs-static.s3.amazonaws.com/126356_ef7961b3ac164cd080982bc743b9777e.html Accessed 1.21.2022

Packages cited:

c("raster", "sp", "sf", "tidyverse", "terra", 'spData', 'spdep', 'gstat', 'spatstat', 'fields' ) %>%
  map(citation) %>%
  print(style = "text", na.print = '')
[[1]]
Hijmans R (2022). _raster: Geographic Data Analysis and Modeling_. R
package version 3.5-15, <URL:
https://CRAN.R-project.org/package=raster>.

[[2]]
Pebesma EJ, Bivand RS (2005). "Classes and methods for spatial data in
R." _R News_, *5*(2), 9-13. <URL:
https://CRAN.R-project.org/doc/Rnews/>.

Bivand RS, Pebesma E, Gomez-Rubio V (2013). _Applied spatial data
analysis with R, Second edition_. Springer, NY. <URL:
https://asdar-book.org/>.

[[3]]
Pebesma E (2018). "Simple Features for R: Standardized Support for
Spatial Vector Data." _The R Journal_, *10*(1), 439-446. doi:
10.32614/RJ-2018-009 (URL: https://doi.org/10.32614/RJ-2018-009), <URL:
https://doi.org/10.32614/RJ-2018-009>.

[[4]]
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E,
Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi
K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the
tidyverse." _Journal of Open Source Software_, *4*(43), 1686. doi:
10.21105/joss.01686 (URL: https://doi.org/10.21105/joss.01686).

[[5]]
Hijmans R (2022). _terra: Spatial Data Analysis_. R package version
1.5-18, <URL: https://rspatial.org/terra/>.

[[6]]
Bivand R, Nowosad J, Lovelace R (2021). _spData: Datasets for Spatial
Analysis_. R package version 2.0.1, <URL:
https://CRAN.R-project.org/package=spData>.

[[7]]
Bivand R, Wong DWS (2018). "Comparing implementations of global and
local indicators of spatial association." _TEST_, *27*(3), 716-748.
<URL: https://doi.org/10.1007/s11749-018-0599-x>.

Bivand RS, Pebesma E, Gomez-Rubio V (2013). _Applied spatial data
analysis with R, Second edition_. Springer, NY. <URL:
https://asdar-book.org/>.

[[8]]
Pebesma EJ (2004). "Multivariable geostatistics in S: the gstat
package." _Computers & Geosciences_, *30*, 683-691.

Gräler B, Pebesma E, Heuvelink G (2016). "Spatio-Temporal Interpolation
using gstat." _The R Journal_, *8*, 204-218. <URL:
https://journal.r-project.org/archive/2016/RJ-2016-014/index.html>.

[[9]]
Baddeley A, Rubak E, Turner R (2015). _Spatial Point Patterns:
Methodology and Applications with R_. Chapman and Hall/CRC Press,
London. <URL:
https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/9781482210200/>.

Baddeley A, Turner R, Mateu J, Bevan A (2013). "Hybrids of Gibbs Point
Process Models and Their Implementation." _Journal of Statistical
Software_, *55*(11), 1-43. doi: 10.18637/jss.v055.i11 (URL:
https://doi.org/10.18637/jss.v055.i11).

Baddeley A, Turner R (2005). "spatstat: An R Package for Analyzing
Spatial Point Patterns." _Journal of Statistical Software_, *12*(6),
1-42. doi: 10.18637/jss.v012.i06 (URL:
https://doi.org/10.18637/jss.v012.i06).

[[10]]
Douglas Nychka, Reinhard Furrer, John Paige, Stephan Sain (2021).
"fields: Tools for spatial data." R package version 13.3, <URL:
https://github.com/dnychka/fieldsRPackage>.